Skip to content

video/tts audio clip#668

Closed
Peggy0422 wants to merge 6 commits intoonvif:developmentfrom
Peggy0422:video/TTS-audio-clip
Closed

video/tts audio clip#668
Peggy0422 wants to merge 6 commits intoonvif:developmentfrom
Peggy0422:video/TTS-audio-clip

Conversation

@Peggy0422
Copy link
Copy Markdown

To support audio product with TTS function, serveral operations should be done, which are:

  1. TTSCapabilities(Optional): Add complex type TTSCapabilities to the exsiting complex Type "AudioClipCapabilities" as optional, to indicate whether the device is capable of TTS function and the detailed configuration if so.
    parameter:
    MaxContentLength: the Max length of the content in a text file that device could convert into an audio clip;
    TTSLanguage: indicates what languages the device supports for client to choose to perform TTS.
    TTSVoiceType: indicates what types of voice that device supports when device play an audio clip converted from a text.

  2. Add “AddTTSAudioClip”and "AddTTSAudioClipResponse"element: To send a text and its configuartion to device that supports TTS, so that device could convert it into an audio clip and play it according to Configuration and TTS Configuration.
    Parameter:
    Token(Optional): token for the audio clip.
    Configuration: Audio clip configuration to add, reference to Configuration for AddAudioClip.
    TTSConfiguration: The configuration for the TTS audio clip to add, it specifys the audio content, language and voice type when device play this audio clip.
    Reponse:
    Token: Unique token of the TTS audio clip to be uploaded.

media2.wsdl

  1. Added AddTTSAudioClip request and AddTTSAudioClip response for sending a text and its TTS configuration to the device
  2. Added complex types "TTS Audio" for TTSConfiguration to support TTS function. It includes parameters Content, Language, VoiceType.
  3. updated AudioClipCapabilities with TTSCapabilities, and added complex types for TTSCapabilitiesto indicate the device supports TTS function and its corresponding configuration.
    complex types TTSCapabilities includes MaxContentLength, TTSLanguage and TTSVoiceType.
  4. Added simpleType TTSLanguage and TTSVoiceType.

media2.xml and documentation

  1. Added detailed descriptions for AddTTSAudioClip operations, explaining their purpose, parameters, and responses.
  2. updated audio clip Capabilities with TTSCapabilities.
    ONVIF-Media2-Service-Spec-TTS update.docx

1. Added AddTTSAudioClip request and AddTTSAudioClip response for sending a text and its TTS configuration to the device(1621-1652)(2036-2041)(2418-2422)(2935-2943).
2. Added complex types "TTS Audio" (1465-1485)for TTSConfiguration to support TTS function. It includes parameters Content, Language, VoiceType.
3. updated AudioClipCapabilities with TTSCapabilities(177-181), and added complex types for TTSCapabilities(201-220)to indicate the device supports TTS function and its corresponding configuration. 
complex types TTSCapabilities includes MaxContentLength, TTSLanguage and TTSVoiceType.
4. Added simpleType TTSLanguage(220-231) and TTSVoiceType(232-238).
1. Added detailed descriptions for AddTTSAudioClip operations, explaining their purpose, parameters, and responses.(2359-2416)
2. updated audio clip Capabilities with TTSCapabilities.(2698-2700)
update code line information for TTS function
correct some editorial errors
<xs:documentation>Audio clip configuration to add.</xs:documentation>
</xs:annotation>
</xs:element>
<xs:element name="TTSConfiguration" type="tr2:TTSAudio">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is TSSConfiguration for audio clip is returned in GetAudioClips API response? If not, how client can query TSSConfiguration for the given audio clip.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is no TTSConfiguration for audio clip returned in GetAudioClips API response. TTS configuration is just for device to convert a text to an audio clip, and it is stored in device just like other audio clips. So far, there is no use case for querying TTSConfiguration in GetAudioClips API response. If considering distinguish TTS audio clip and pre-recorded audio clip, client could consider to use element "name".

<varlistentry>
<term>faults</term>
<listitem>
<para role="param">env:Receiver - ter:Action - ter:MaxAudioClipLimit</para>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to rename ter:MaxAudioClipLimit to ter:MaxAudioClip to unifor with similar errors for other functions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxAudioClipLimit parameter was added as part of Audio Clip Management feature and the technical specification for this feature is released in ONVIF V25.06. Changing the parameter name now can cause backward combability issue.

</xs:complexType>
<!--===============================-->
<!--=============TTS Capability=================-->
<xs:complexType name="TTSCapabilities">
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also have the maximum number of clips? Since the device can return ter:MaxAudioClip , the limit should be available as a capability

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TTS audio clip is actually an audio clip, there is an attribute"MaxAudioClipLimit" in AudioClipCapabilities already, it can cover TTS audio clip.

@Peggy0422 Peggy0422 marked this pull request as ready for review December 1, 2025 06:18
Updated the description of the AddTTSAudioClip operation to clarify the parameters and response. Updated the description of TTScapabilities.
@ocampana-videotec
Copy link
Copy Markdown
Collaborator

@Peggy0422 I do not understand the relationship between this PR, #692 and #694 . What is the right one?

@sujithhanwha
Copy link
Copy Markdown
Contributor

Closing this PR since already a new PR is open for the same feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants